Categorization of Turkish News Documents with Morphological Analysis

نویسندگان

  • Burak Kerim Akkus
  • Ruken Cakici
چکیده

Morphologically rich languages such as Turkish may benefit from morphological analysis in natural language tasks. In this study, we examine the effects of morphological analysis on text categorization task in Turkish. We use stems and word categories that are extracted with morphological analysis as main features and compare them with fixed length stemmers in a bag of words approach with several learning algorithms. We aim to show the effects of using varying degrees of morphological information.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents

Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...

متن کامل

Yandex School of Data Analysis approach to English-Turkish translation at WMT16 News Translation Task

We describe the English-Turkish and Turkish-English translation systems submitted by Yandex School of Data Analysis team to WMT16 news translation task. We successfully applied hand-crafted morphological (de-)segmentation of Turkish, syntax-based pre-ordering of English in English-Turkish and post-ordering of English in Turkish-English. We perform desegmentation using SMT and propose a simple y...

متن کامل

Personalized News Categorization Through Scalable Text Classification

Existing news portals on the WWW aim to provide users with numerous articles that are categorized into specific topics. Such a categorization procedure improves presentation of the information to the end-user. We further improve usability of these systems by presenting the architecture of a personalized news classification system that exploits user’s awareness of a topic in order to classify th...

متن کامل

A New Approach for Semi-supervised Online News Classification

Due to the dramatic increasing of information on the Web, text categorization becomes a useful tool to organize the information. Traditional text categorization problem uses a training set from online sources with pre-defined class labels for text documents. Typically a large amount of online training news should be provided in order to learn a satisfactory categorization scheme. We investigate...

متن کامل

Spotting scientific and technical specialization in biomedical documents using morphological clues

Distinction of the specialization level of the health documents on Internet is an important indication, especially when documents are read by non expert users such as patients. Indeed, a high technicity of documents impedes the patients to understand the content and may have a negative consequence on their health care process and on their communication with medical doctors. When medical portals...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013